12 research outputs found

    The Early Bird Catches The Term: Combining Twitter and News Data For Event Detection and Situational Awareness

    Full text link
    Twitter updates now represent an enormous stream of information originating from a wide variety of formal and informal sources, much of which is relevant to real-world events. In this paper we adapt existing bio-surveillance algorithms to detect localised spikes in Twitter activity corresponding to real events with a high level of confidence. We then develop a methodology to automatically summarise these events, both by providing the tweets which fully describe the event and by linking to highly relevant news articles. We apply our methods to outbreaks of illness and events strongly affecting sentiment. In both case studies we are able to detect events verifiable by third party sources and produce high quality summaries

    Disease surveillance using a hidden Markov model

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Routine surveillance of disease notification data can enable the early detection of localised disease outbreaks. Although hidden Markov models (HMMs) have been recognised as an appropriate method to model disease surveillance data, they have been rarely applied in public health practice. We aimed to develop and evaluate a simple flexible HMM for disease surveillance which is suitable for use with sparse small area count data and requires little baseline data.</p> <p>Methods</p> <p>A Bayesian HMM was designed to monitor routinely collected notifiable disease data that are aggregated by residential postcode. Semi-synthetic data were used to evaluate the algorithm and compare outbreak detection performance with the established Early Aberration Reporting System (EARS) algorithms and a negative binomial cusum.</p> <p>Results</p> <p>Algorithm performance varied according to the desired false alarm rate for surveillance. At false alarm rates around 0.05, the cusum-based algorithms provided the best overall outbreak detection performance, having similar sensitivity to the HMMs and a shorter average time to detection. At false alarm rates around 0.01, the HMM algorithms provided the best overall outbreak detection performance, having higher sensitivity than the cusum-based Methods and a generally shorter time to detection for larger outbreaks. Overall, the 14-day HMM had a significantly greater area under the receiver operator characteristic curve than the EARS C3 and 7-day negative binomial cusum algorithms.</p> <p>Conclusion</p> <p>Our findings suggest that the HMM provides an effective method for the surveillance of sparse small area notifiable disease data at low false alarm rates. Further investigations are required to evaluation algorithm performance across other diseases and surveillance contexts.</p

    A Methodological Framework for the Evaluation of Syndromic Surveillance Systems: A Case Study of England

    Get PDF
    Background: Syndromic surveillance complements traditional public health surveillance by collecting and analysing health indicators in near real time. The rationale of syndromic surveillance is that it may detect health threats faster than traditional surveillance systems permitting more timely, and hence potentially more effective public health action. The effectiveness of syndromic surveillance largely relies on the methods used to detect aberrations. Very few studies have evaluated the performance of syndromic surveillance systems and consequently little is known about the types of events that such systems can and cannot detect. Methods: We introduce a framework for the evaluation of syndromic surveillance systems that can be used in any setting based upon the use of simulated scenarios. For a range of scenarios this allows the time and probability of to be determined and uncertainty is fully incorporated. In addition, we demonstrate how such a framework can model the benefits of increases in the number of centres reporting syndromic data and also determine the minimum size of outbreaks that can or cannot be detected. Here, we demonstrate its utility using simulations of national influenza outbreaks and localised outbreaks of cryptosporidiosis. Results: Influenza outbreaks are consistently detected with larger outbreaks being detected in a more timely manner. Small cryptosporidiosis outbreaks (<1000 symptomatic individuals) are unlikely to be detected. We also demonstrate the advantages of having multiple syndromic data streams (e.g. emergency attendance data, telephone helpline data, general practice consultation data) as different streams are able to detect different types outbreaks with different efficacy (e.g. emergency attendance data are useful for the detection of pandemic influenza but not for outbreaks of cryptosporidiosis). We also highlight that for any one disease, the utility of data streams may vary geographically, and that the detection ability of syndromic surveillance varies seasonally (e.g. an influenza outbreak starting in July is detected sooner than one starting later in the year). We argue that our framework constitutes a useful tool for public health emergency preparedness in multiple settings. Conclusions: The proposed framework allows the exhaustive evaluation of any syndromic surveillance system and constitutes a useful tool for emergency preparedness and response

    Epidemic Surveillance Using an Electronic Medical Record: An Empiric Approach to Performance Improvement

    Get PDF
    <div><p>Backgrounds</p><p>Electronic medical records (EMR) form a rich repository of information that could benefit public health. We asked how structured and free-text narrative EMR data should be combined to improve epidemic surveillance for acute respiratory infections (ARI).</p><p>Methods</p><p>Eight previously characterized ARI case detection algorithms (CDA) were applied to historical EMR entries to create authentic time series of daily ARI case counts (background). An epidemic model simulated influenza cases (injection). From the time of the injection, cluster-detection statistics were applied daily on paired background+injection (combined) and background-only time series. This cycle was then repeated with the injection shifted to each week of the evaluation year. We computed: a) the time from injection to the first statistical alarm uniquely found in the combined dataset (Detection Delay); b) how often alarms originated in the background-only dataset (false-alarm rate, or FAR); and c) the number of cases found within these false alarms (Caseload). For each CDA, we plotted the Detection Delay as a function of FAR or Caseload, over a broad range of alarm thresholds.</p><p>Results</p><p>CDAs that combined text analyses seeking ARI symptoms in clinical notes with provider-assigned diagnostic codes in order to maximize the precision rather than the sensitivity of case-detection lowered Detection Delay at any given FAR or Caseload.</p><p>Conclusion</p><p>An empiric approach can guide the integration of EMR data into case-detection methods that improve both the timeliness and efficiency of epidemic detection.</p></div

    Epidemic features affecting the performance of outbreak detection algorithms

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Outbreak detection algorithms play an important role in effective automated surveillance. Although many algorithms have been designed to improve the performance of outbreak detection, few published studies have examined how epidemic features of infectious disease impact on the detection performance of algorithms. This study compared the performance of three outbreak detection algorithms stratified by epidemic features of infectious disease and examined the relationship between epidemic features and performance of outbreak detection algorithms.</p> <p>Methods</p> <p>Exponentially weighted moving average (EWMA), cumulative sum (CUSUM) and moving percentile method (MPM) algorithms were applied. We inserted simulated outbreaks into notifiable infectious disease data in China Infectious Disease Automated-alert and Response System (CIDARS), and compared the performance of the three algorithms with optimized parameters at a fixed false alarm rate of 5% classified by epidemic features of infectious disease. Multiple linear regression was adopted to analyse the relationship of the algorithms’ sensitivity and timeliness with the epidemic features of infectious diseases.</p> <p>Results</p> <p>The MPM had better detection performance than EWMA and CUSUM through all simulated outbreaks, with or without stratification by epidemic features (incubation period, baseline counts and outbreak magnitude). The epidemic features were associated with both sensitivity and timeliness. Compared with long incubation, short incubation had lower probability (β* = −0.13, P < 0.001) but needed shorter time to detect outbreaks (β* = −0.57, P < 0.001). Lower baseline counts were associated with higher probability (β* = −0.20, P < 0.001) and longer time (β* = 0.14, P < 0.001). The larger outbreak magnitude was correlated with higher probability (β* = 0.55, P < 0.001) and shorter time (β* = −0.23, P < 0.001).</p> <p>Conclusions</p> <p>The results of this study suggest that the MPM is a prior algorithm for outbreak detection and differences of epidemic features in detection performance should be considered in automatic surveillance practice.</p
    corecore